Decomposition Methods for Solving Finite-Horizon Large MDPs

نویسندگان

چکیده

Conventional algorithms for solving Markov decision processes (MDPs) become intractable a large finite state and action spaces. Several studies have been devoted to this issue, but most of them only treat infinite-horizon MDPs. This paper is one the first works deal with non-stationary finite-horizon MDPs by proposing new decomposition approach, which consists in partitioning problem into smaller restricted MDPs, each MDP solved independently, specific order, using proposed hierarchical backward induction (HBI) algorithm based on (BI) algorithm. Next, sub-local solutions are combined obtain global solution. An example racetrack problems shows performance proposal technique.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Lazy Approximation for Solving Continuous Finite-Horizon MDPs

Solving Markov decision processes (MDPs) with continuous state spaces is a challenge due to, among other problems, the well-known curse of dimensionality. Nevertheless, numerous real-world applications such as transportation planning and telescope observation scheduling exhibit a critical dependence on continuous states. Current approaches to continuous-state MDPs include discretizing their tra...

متن کامل

Lazy Approximation: A New Approach for Solving Continuous Finite-Horizon MDPs

متن کامل

Trial-Based Heuristic Tree Search for Finite Horizon MDPs

Dynamic programming is a well-known approach for solving MDPs. In large state spaces, asynchronous versions like Real-Time Dynamic Programming (RTDP) have been applied successfully. If unfolded into equivalent trees, Monte-Carlo Tree Search algorithms are a valid alternative. UCT, the most popular representative, obtains good anytime behavior by guiding the search towards promising areas of the...

متن کامل

Divide-and-Conquer Methods for Solving MDPs

The Markov Decision Process (MDP) is the principal theoretical formalism in the area of Reinforcement Learning (RL). An import from optimal control in operations research, this construct is generic enough to represent problems comprising almost all of AI research, but consequently, it suffers from the curse of dimensionality where learning involves an exponential number of parameters. Researche...

متن کامل

Analysis of methods for solving MDPs

New proofs for two extensions to value iteration are derived when the type of initialisation of the value function is considered. Theoretical requirements that guarantee the convergence of backward value iteration and weaker requirements for the convergence of backups based on best actions only are identified. Experimental results show that standard value iteration performs significantly faster...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Journal of Mathematics

سال: 2022

ISSN: ['2314-4785', '2314-4629']

DOI: https://doi.org/10.1155/2022/8404716